home *** CD-ROM | disk | FTP | other *** search
/ Developer Source 18 / Developer Source Volume 18 (I-MODE Publications, Inc.)(2000).iso / dbprd / apr97 / 1p48.gif next >
Graphics Interchange Format  |  1998-02-10  |  102KB  |  441x570  |  4-bit (16 colors)
Labels: text | screenshot | font | number | document
OCR: Features Selection In information theoretic identifies those features whose values best differentiate between class values. This task is accomplished through a calculation such as this; IG(A) E PIA -V.) X [-PIA =V ]log,P(A =V ) - PIA, = V, [A = )log,PIA =VIA=V]], A = {A, I A is one of i independent variables} V. - {V. I V ., is one of j possible values of A.} A. - IV. I V . is one of k possible values of A.] Practitioners frequently refer to the calculated value as information gain. In our example, the A's consist of Attendance, Production, and so on; V. for Atten- dance consists of the set {Poor, Average, Outstanding]; and A. - Merit Raise with values Pand N. Here's the information gain for each of the employee data- base features: Feature Feature Values Information Gain Atendance Poor Average Outstanding 0.0537 P=2 P = 1 P=5 N=1 N=3 N=2 Production Popr Average Outstanding 0.0012 P=1 P=3 P=4 N=1 N=2 N == 3 Cooperation Port Average Excellent 20.284 P=0 P=3 P=5 N=3. N=3 N=0 Wvaste Low High 0.00043 P=6 P=2 N=5 Qualit Pour 0.0532 P=4 N=4